Weighted Evidence Accumulation Clustering Using Subsampling

نویسندگان

F. Jorge F. Duarte

Ana L. N. Fred

Fátima Rodrigues

João M. M. Duarte

André Lourenço

چکیده

We introduce an approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process. By applying a clustering algorithm to this co-association matrix we obtain the final data partition. In this paper we propose a clustering ensemble combination approach that uses subsampling and that weights differently the partitions (WEACS). We use two ways of weighting each partition: SWEACS, using a single validation index, and JWEACS, using a committee of indices. We compare combination results with the EAC technique and the HGPA, MCLA and CSPA methods by Strehl and Gosh using subsampling, and conclude that the WEACS approaches generally obtain better results. As a complementary step to the WEACS approach, we combine all the final data partitions produced by the different variations of the method and use the Ward Link algorithm to obtain the final data partition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Definition of MV Load Diagrams via Weighted Evidence Accumulation Clustering using Subsampling

A definition of medium voltage (MV) load diagrams was made, based on the data base knowledge discovery process. Clustering techniques were used as support for the agents of the electric power retail markets to obtain specific knowledge of their customers’ consumption habits. Each customer class resulting from the clustering operation is represented by its load diagram. The Two-step clustering a...

متن کامل

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

On the Consistency of k-means++ algorithm

We prove in this paper that the expected value of the objective function of the k-means++ algorithm for samples converges to population expected value. As k-means++, for samples, provides with constant factor approximation for k-means objectives, such an approximation can be achieved for the population with increase of the sample size. This result is of potential practical relevance when one is...

متن کامل

ConsensusClusterPlus (Tutorial)

Consensus Clustering [1] is a method that provides quantitative evidence for determining the number and membership of possible clusters within a dataset, such as microarray gene expression. This method has gained popularity in cancer genomics, where new molecular subclasses of disease have been discovered [3, 4]. The Consensus Clustering method involves subsampling from a set of items, such as ...

متن کامل

Coresets for Nonparametric Estimation - the Case of DP-Means

Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore the use of coresets – a data summarization technique originating from computational geometry – for this task. Coresets are weighted subsets of the data such that models trained on these coresets are provably competitive with models trained on the full dataset. Coresets sublinear in the dataset si...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Weighted Evidence Accumulation Clustering Using Subsampling

نویسندگان

چکیده

منابع مشابه

Definition of MV Load Diagrams via Weighted Evidence Accumulation Clustering using Subsampling

Bilateral Weighted Fuzzy C-Means Clustering

On the Consistency of k-means++ algorithm

ConsensusClusterPlus (Tutorial)

Coresets for Nonparametric Estimation - the Case of DP-Means

عنوان ژورنال:

اشتراک گذاری